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Abstract. In this paper, the problem of automatic Gabor wavelet selection for 
face recognition is tackled by introducing an automatic algorithm based on Par- 
allel AdaBoosting method. Incorporating mutual information into the algorithm 
leads to the selection procedure not only based on classification accuracy but also 
on efficiency. Effective image features are selected by using properly chosen Gabor 
wavelets optimised with Parallel AdaBoost method and mutual information to get 
high recognition rates with low computational cost. Experiments are conducted 
using the well-known FERET face database. In proposed framework, memory 
and computation costs are reduced significantly and high classification accuracy 
is obtained. 



1. Introduction 

Automatic face recognition is a challenging problem with currently achievable lev- 
els of performance not adequate for universal practical application, and there re- 
mains a need for further work to improve performance and flexibility. Numerous 
algorithms have been developed for face recognition since it has been proved that 
Gabor-type receptive field could extract the maximum information from local im- 
age regions [1] and Gabor filters function similarly to the visual neurons of the hu- 
man visual system [2J. Therefore, mathematical transforms using Gabor wavelets 
(GW) play an increasingly important role in extracting robust features from face 
images for classification |3l|4H5H6|. 

Representing images by GW is difficult problem due to two reasons. First, since 
GW are not orthogonal, they cannot be used as basis functions as the reconstruc- 
tion coefficients will not be unique for each image. Second, although exploiting the 
locality property of GW allows convolution of a GW at each location of the image 
to extract detailed local image information, the application of GW for all possible 
orientations and scales at every location in the image results in an enormous com- 
putational overhead. High computational cost can be avoided by reducing the 
feature dimensionality which requires optimization of the criterions for selecting 
GW. 

Most existing research studies select Gabor wavelets empirically, rather than 
optimally. The challenge that researchers are facing today is how best to exploit 
GW to maximize benefits in terms of object recognition performance. This paper 
aims to optimize the criterions for selecting GW by AdaBoost (AB) algorithm in 
a parallel manner and incorporating mutual information (MI) into the algorithm. 
After giving the theoretical background on GW in Section |2j AB and Parallel Ad- 
aBoost (PAB) algorithms are explained in Section [3] as means of feature extraction 
and selection. Section |4] explains MI based GW selection procedures with AB and 
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PAB algrotihms respectively. Experimental results in Section [5] are followed by 
conclusion in Section [6] 



2. Gabor Wavelets (GW) 

In the spacial domain, the 2D Gabor filter is a Gaussian kernel modulated by a 
sinusoidal plane wave J3JIU 

where x' — xcosO + ysinO, y' = —xsinO + ycosO, f is the central frequency of the 
sinusoidal plane wave, 8 is the anti-clockwise rotation of the Gaussian and the 
plane wave, a is the sharpness of the Gaussian along the major axis parallel to 
the wave, and (3 is the sharpness of the Gaussian minor axis perpendicular to the 
wave. 7 = £ and r\ = 4 are defined to keep the ratio between frequency and 
sharpness constant. The Gabor filters, like many other wavelets, can be generated 
from one mother wavelet by dilation and rotation. Each filter is in the shape of 
plane waves with frequency /, restricted by a Gaussian envelope function with 
relative width a and (3. To extract useful features from an image, normally a set of 
Gabor filters with different frequencies and orientations are required [HE), 

fu /maa:/ v2^ 5 @v — T/ 7 ^' 

(2) u = 0, ...,[/- 1, v = o,...,v -1. 

As shown in Eq ((T]| and Q, the following parameters need to be determined to de- 
sign Gabor filters for feature extraction: the highest peak frequency fmax, the ratio 
between centre frequency and the sharpness of Gaussian major axis: 7 and minor 
axis: 77, the number of scales U and orientations V. See our previous studies | 
for further theoretical details on how to select these parameters . 



3. Feature Extraction and Selection Using GW 

The aim is to use GW to extract unique features uniformly across all images so 
that these features can be compared for face recognition. A common approach is 
to convolve each image with the same set of GW. The number of Gabor wavelets 
used for this varies with different applications, but usually 40 filters (U=5 scales 
and V=8 orientations) are chosen empirically for face recognition applications J3[ 
SJIZJEl- Specifically, given a bank of 40 GW (x, y), u = 0, . . . ,4, v = 0, . . . , 7}, 
image features at different locations, frequencies and orientations can be extracted 
by convolving the image I(x, y), locally, with the GW 0( i v (x, y) = \I * (p UiV \(x, y). 
The feature set thus consists of the results of the local convolution of the image 
I(x, y) with all of the 40 GW 

(3) S = {Oi >v (x, y) : u € {0, 4} , v € {0, 7}} . 

A Gabor feature vector can be obtained by concatenating the rows (or columns) 
of 0^ v (x, y) for all u, v to represent the image: G(I) = O = (Oq , Oq x , . . . , 0\ 7 ) 
where G(.) is the Gabor feature extraction operation. As an example, taking an 
image of size 64 x 64, the Gabor feature vector will be of 64 x 64 x 5 x 8=163.840 
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dimensions, which is incredibly large. Due to the large number of convolution 
operations, the computation cost is also necessarily high. 

Instead of performing a convolution operation at every image location, using 
all the 40 GW, it is more sensible to select only the relevant GW to perform con- 
volution with the image at appropriate positions. Two questions arise from this 
consideration: first, which wavelets should be used and, second, at which image 
locations. To fully appreciate the solution of these questions, we have developed 
an approach using AB algorithm to select GW based on not only location param- 
eter of GW, but also orientation and frequency parameters of GW in In 
this study, we improve this approach further by introducing the MI concept to the 
feature selection procedure and parallelizing the AB algorithm. 

3.1. Parallel AdaBoost (PAB) Algorithm. Briefly, AB algorithm iteratively builds 
a trainable model M using linear superposition of different realizations. The base 
model M is re-trainable by using different weight combinations, w = W\ , W2, ■ ■ ■ Wn, 
where N is number of samples [8. 9J. After each training step, the weights are up- 
dated according to classification performance of the previous step over the train- 
ing data. The weights of misclassified points, yi = { — 1}, are increased and weights 
of correctly classified points, yi = {+1}, are decreased accordingly [8 J. Therefore, 
at each step there is an associated model The final hypothesis/model is the 
linear superposition of all these model instances. 

The AB algorithm is computationally expensive. In particular, for any "hard" 
point, the distribution of the associated weights appears to converge, as the num- 
ber of the steps of the AB algorithm grows to infinity, to a definite, stable distru- 
bition [9]. PAB aims to decrease the computational cost by approximating these 
asymptotic distributions. It is shown that weight parameters can be modelled 
well by Gamma distributions of suitable parameters |[T0l . Using early estimates of 
weights, one can construct a distribution system from which AB weights can be 
selected instead of waiting for the sequential outputs of each steps. Once weight 
distributions 7* are modelled under the Gamma distribution by 

x a— i g — x/0 

(4) 7= T(a)6 a ' 

then weights are updated independently and randomly from this distribution 
where values for a and 9 are obtained from the mean, /z, and the variance, a 2 , 
of the weights based on first 5-step evolutions. The relationship of these variables 
is the following: 

(5) n — a6 and a 2 = a6 2 

3.1.1. P-Boost Algorithm. Given the data set E = {(xi, j/<)} i=1 ; 

(1) Initialize weights Wi(l) = 1/N, i = 1, N 

(2) Run AdaBoost for S steps and keep weights for each step, Wi (n) , n = 
l,...,S 

(3) For i= 1,...N, estimate the distribution 7* from weights stored previously, 

Wi(ri). 

(4) PARALLEL COMPUTATION STARTS HERE 

For each value of n e {S + 1, T} : do the steps below in parallel 
(a) For i running on the data set, generate random and independent weights 
w*{n) by sampling the corresponding 7*; 



4 



BAGCI AND BAI 



(b) Train base model M using weights w* (n), resultant model instance 

M n ; 

(c) Compute model error e„; 

(d) Compute model weights c n : c n = |ln( ~ e " ) 
(5) Compute the output Hypothesis 



As easily seen that after the step 3, new values to the weights could then be as- 
signed not by following the standard AB algorithm, but by randomly and inde- 
pendently sampling the respective Gamma distribution model. This leads dra- 
matic reduction in computational cost without losing accuracy in classification 
performance due to correctly keeping dynamics of stochastic process. 

3.2. Selecting Gabor Wavelets Using PAB. We simplify the task of selecting GW 
for feature extraction from a multi-class face recognition problem to a two-class 
problem: selecting GW that are effective for intra- and extra-person space discrim- 
ination. Such selected GW should be robust for face recognition, as intra- and 
extra-person space discrimination is one of the major difficulties in face recogni- 
tion. 

The transition from a multi-class to a two-class problem is based on a method 
proposed in ||TT]. reformulating the face recognition problem as a two class prob- 
lem. Two spaces, intra- and extra-person spaces are defined, with intra-person 
space measuring respectively dissimilarities between faces of the same person and 
extra-person space dissimilarities between different people. We define intra- and 
extra-person spaces as 



where I p and I q are the facial images of persons p and q respectively. Now it is 
seen that intra- and extra person space discrimination is a two-class problem and 
to use PAB algorithm for selecting GW, the training set will be T£ = IntraU Extra. 
Samples in the intra-person space are regarded as positive examples whilst those 
from extra-person space are regarded as negative examples. Each weak classifier 
can be defined on one Gabor wavelet, such that the weak classifier determines the 
class of a vector based on a feature extracted from the vector using just this one 
Gabor wavelet. Selected weak classifiers (and therefore the corresponding GW) 
are therefore effective in discriminating intra- and extra-person classes, and should 
be used to extract features for face recognition. Recall that each component of a 
vector in I£ is associated with a Gabor wavelet, i.e., it is obtained by convolving an 
image with a Gabor wavelet fj(I) = \\G(I p ) — G(I q )\\j, therefore, a weak classifier 
can be defined as a simple threshold function on a component of the vector as 



where Aj can be determined by the intra-person sample mean and extra-person 
sample mean 



H{x) = El 



c n M n (x) 



(6) 



Intra = {\G(I p )-G{I q )\,I p ~I q } 
Extra = {\G(I P ) - G(I q )\,I p * I g } , 




(8) 
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where m and I are the numbers of intra- and extra-person samples, respectively. 

In each of the PAB and/ or AB iterations, the space of all possible weak classi- 
fiers is searched exhaustively to find the best weak classifier that will produce the 
lowest classification error. The error is then used to update the weights such that 
the wrongly classified samples get more focus. The resulting strong classifier is a 
weighted linear combination of all the selected weak classifiers. The PAB and / or 
AB algorithm select hundreds of features and weak classifiers to form the final 
strong classifier. 



4. Mutual Information Usage in Boosting Algorithms 

The PAB and AB algorithm select only features that perform "individually" 
best, and the redundancy among selected features is not considered. To elimi- 
nate redundancy, MI can be used. Before a new weak classifier is selected, the MI 
between the new classifier and those already selected is examined to make sure 
that the information carried by the new classifier has not been captured before. 
At stage T where T — 1 weak classifiers {h v n\, h v r 2 )> • • • > hv(T-i)i } are selected, 
the function to measure the MI between a candidate classifier hj and the selected 
classifiers can be defined as follows 

(9) M(hj) = argmax/(/i i ,/ Mt) ) t = 1, 2, . . . ,T - 1. 

Each weak classifier is now considered as a random variable. The estimation of 
MI between two such variables, e.g. r\ and r%, requires information about the mar- 
ginal distribution p{r\), p(r2 ) and the joint probability distribution p(ri , r-i), where 
p{.) represents probability. Though a Gaussian distribution could be assumed, 
many of the features might not be Gaussian. To reduce the complexity and com- 
putation cost of the feature selection process, we therefore focus on binary random 
variables only, i.e. r\ E { — 1, +1}, r% € { — 1, +!}• For binary random variables, the 
probabilities could be estimated by simply counting the number of possible cases 
and dividing that number by the total number of training samples. The value of 
M(hj) can be directly used to determine whether the new classifier is redundant 
or not. The value is compared with a pre-defined threshold 5 MI , if it is bigger than 
the S MI , we can deduce that the information carried by the classifier has already 
been captured. Besides MI, the classification error of the weak classifier is also 
taken into consideration, i.e., only those classifiers with small classification errors 
are selected. The features thus selected are uncorrelated with each other and are 
therefore non-redundant. 

Fig. [T] shows the first and last six selected GW using MI enhanced PAB algo- 
rithm. It is interesting to see that most of the selected Gabor features are located 
around the prominent facial features such as eyebrows, eyes, nose and chin, which 
indicates that these regions are more robust against the variance of expression and 
illumination encountered within the database subset. This result is consistent with 
the fact that the eye and eyebrow regions remain relatively stable when a person's 
facial expression changes. Recall that the selection criterion is the ability of the 
GW in discriminating intra- and extra-person classes. 
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FIGURE 1 . First 6, last 6, and position of 200 selected wavelets 



5. Experiments and Results 

We use a subset of 600 images from the FERET database to test the Gabor feature 
selection algorithm using AB, PAB and MI. Two images of each subject are ran- 
domly chosen for training, and the remaining one is used for testing. The selected 
400 face images (2 images for each subject) are first used in boosting algorithms 
(AB and PAB) training to select GW for intra- and extra-person space discrimi- 
nation. As a result, 200 intra-person difference samples and 1,600 extra-person 
difference samples are randomly generated for training [5). 

Although the required training time is longer than using the original AB due to 
the use of MI, the computational cost is reduced by using PAB algorithm so that 
required training time using MI with PAB is always lower than that of AB with 
MI. If the computational cost of AB algorithm is O(T), on the other hand, the cost 
of PAB algorithm is 0(5) + (T — 5).0(1), where number of serial iterations S in 
PAB is chosen smaller than total number of iterations T in AB, S < T. 

The normalized correlation distance measure and the nearest neighbor classifier 
are used. Table[T]shows the recognition performance on the 200 test images, where 
the highest accuracies achieved for the three algorithms are 93%, 95% and 96% for 
AB, AB+MI and PAB+MI respectively. Since the MI values for all of the first 60 
features are quite small, the effect of mutual information on the selection process is 
not obvious initially. However, once the number of features increases, AB and PAB 
start to pick up highly redundant features while the use of mutual information 
reduces the redundancy and improves recognition rate. In PAB, first 50 iterations 
are processed as AB, then the algorithm is parallelized in which weights in each 
iteration are selected randomly and independently from the model built using 
first 50 weights dynamics. To compare AB with PAB, not only computational cost 
is reduced dramatically, but also PAB algorithm appears to converge quickly to 
the reference model. 

Table |2] shows the recognition rates of PAB+MI algorithm for different values 
of S. Note that for all values of S e {< T = 200}, recognition rates are slighthly 
higher than the AB+MI case together with less computational cost respectively. 
The results indicate that PAB+MI method for various values of S achieves the 
best result and converge quickly with respect to AB+MI case. Note that a few 
sequential steps are sufficient for PAB+MI to attain performances comparable with 
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Table 1. Face Recognition Rates (%) for various dimensions of 
feature set. AB: AdaBoost, AB+MI: AdaBoost with Mutual In- 
formation and PAB+MI: Parallel-AdaBoost with Mutual Informa- 
tion. 



Feature Dimension 


AB 


AB+MI 


PAB+MI, 5=50 


20 


77.5 


77.5 


77.5 


40 


82.0 


82.0 


82.0 


60 


86.0 


86.0 


86.0 


80 


87.5 


91.5 


91.5 


100 


91.0 


92.5 


93.5 


120 


92.0 


93.5 


96.0 


140 


93.0 


94.5 


96.0 


160 


93.0 


93.5 


95.5 


180 


92.5 


95.0 


94.5 


200 


92.5 


93.5 


93.0 



the reference AB+MI showing that weights dynamics are kept well with Gamma 
distribution. 

Table 2. PAB+MI Recognition Rates for different values of serial 
iteration number n 



Feature Dimension 


5=50 


5=70 


5=100 


5=150 


20 


77.5 


77.5 


77.5 


77.5 


40 


82.0 


82.0 


82.0 


82.0 


60 


86.0 


86.0 


86.0 


86.0 


80 


91.5 


90.0 


91.5 


91.5 


100 


93.5 


91.5 


92.5 


92.5 


120 


96.0 


92.5 


93.5 


93.5 


140 


96.0 


96.0 


95.5 


94.5 


160 


95.5 


94.5 


96.0 


93.5 


180 


94.5 


96.0 


95.0 


94.0 


200 


93.0 


95.5 


94.5 


91.0 



6. Conclusion 

The locality property of GW has both advantages and disadvantages. A posi- 
tive aspect is that it allows the extraction of local features, while a more negative 
aspect is its computational complexity due to uncertainty in the parameter selec- 
tion process. In this paper, we have discussed the effect of GW parameters on face 
recogniton performance and selection of GW for face recognition. We have intro- 
duced, step by step, the development process of GW selection method optimised 
for face recognition. These developments have demonstrated very encouraging 
results when investigated in a practical scenario as applied to the FERET face data- 
base. Work such as that reported here is important in demonstrating how PAB and 
MI techniques can be used to improve the effectiveness, efficiency and reliability 
of face recognition in biometrics-related applications. 



8 



BAGCI AND BAI 



7. Acknowledgements 

This research is funded by the European Commission Fp6 Marie Curie Ac- 
tion Programme (MEST-CT-2005-021 170) under the CMIAG (Collaborative Medi- 
cal Image Analysis on Grid) project. 

References 

[1] K. Okajima, "Two-dimensional gabor-type receptive field as derived by mutual information max- 
imization," Neural Network, vol. 11, no. 3, pp. 441^47, 1998. 

[2] J. G. Daugman, "Uncertainty relation for resolution in space, spatial frequency, and orientation 
optimized by twodimensional visual cortical filters," Journal of the Optical Society of America A - 
Optics, Image Science, and Vision, vol. 2, no. 7, pp. 1160-1169, 1985. 

[3] L. Shen and L. Bai, "Face recognition based on gabor features using kernel methods," in Proc. of 
the 6th IEEE Conference on Face and Gesture Recognition, Korea, 2004, pp. 170-175. 

[4] L. Shen, L. Bai, and M. Fairhurst, "Gabor wavelets and general discriminant analysis for face 
identification and verification," Image and Vision Computing, vol. 25, no. 5, pp. 553-563, 2006. 

[5] L. Shen and L. Bai, "Adaboost gabor feature selection for classification," in Proc. of Image and Vision 
Computing, NewZealand, 2004, pp. 77-83. 

[6] K. Messer, J. Kittler, M. Sadeghi, M. Hamouz, A. Kostin, F. Cardinaux, S. Marcel, S. Bengio, 
C. Sanderson, N. Poh, Y. Rondriguez, J. Czyz, L. Vandendorpe, C. McCool, S. Lowther, S. Srid- 
haran, V. Chandran, R. P. Palacios, E. Vidal, L. Bai, L. Shen, Y. Wang, Y. H. Chiang, H. C. Liu, Y. P. 
Huang, A. Heinrichs, M. Miiller, A. Tewes, C. v. d. Malsburg, R. Wiirtz, Z. G. Wang, F. Xue, Y. Ma, 
Q. Yang, C. Fang, X. Q. Ding, S. Lucey, R. Goss, and H. Schneiderman, "Face authentication test 
on the banca database," in Proc. of International Conference on Pattern Recognition, Cambridge, UK, 
2004, pp. 523-532. 

[7] Laurenz Wiskott, Jean-Marc Fellous, Norbert Kriiger, and Christoph von der Malsburg, "Face 
recognition by elastic bunch graph matching," in Proc. 7th Intern. Conf. on Computer Analysis of 
Images and Patterns, CAIP'97, Kiel, number 1296. 

[8] Y. Freund and R. Schapire, "A short introduction to boosting," Journal of Japanese Society for Artifi- 
cial Intelligence, vol. 14, no. 5, pp. 771-780, 1999. 

[9] S. Merler, B. Caprilea, and C. Furlanelloa, "Parallelizing adaboost by weights dynamics," Compu- 
tational Statistics and Data Analysis, vol. 51, no. 5, pp. 2487-2498, 2007. 
[10] Michael Collins, Robert E. Schapire, and Yoram Singer, "Logistic regression, adaboost and breg- 

man distances," in Computational hearing Theory, 2000, pp. 158-169. 
[11] P. Jonathon Phillips, "Support vector machines applied to face recognition," in Proceedings of the 
1998 conference on Advances in neural information processing systems II, Cambridge, MA, USA, 1999, 
pp. 803-809. 

Collaborative Medical Image Analysis on Grid (CMIAG), The University of Notting- 
ham, Nottingham, UK 

E-mail address: ulasbagci@ieee . org 
URL: www . ulasbagci . net 



